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Abstract — In this work we present results about the rate of 
(relative) Information loss Induced by passing a real-valued, 
stationary stochastic process through a memoryless system. We 
show that for a special class of systems the Information loss 
rate Is closely related to the difference of differential entropy 
rates of the Input and output processes. It Is further shown 
that the rate of (relative) Information loss Is bounded from 
above by the (relative) Information loss the system Induces on a 
random variable distributed according to the process's marginal 
distribution. 

As a side result, In this work we present conditions such that 
for a Markovlan Input process also the output process possesses 
the Markov property. 

Index Terms — data processing inequality. Information loss, en- 
tropy rate, Renyl Information dimension, system theory, lumpa- 
blllty 

I. Introduction 

Signal processing, as defined by many textbooks, is related 
to the "representation, transformation, and manipulation of 
signals and the information the signals contain" [ 1 , emphasis 
added]. Yet, most of these textbooks leave the notion of 
information completely aside and focus, instead, on purely 
energetic aspects or second-order statistics: transfer functions 
for linear filters, their effect on the auto-correlation function of 
its output signal, and similar results for nonlinear, memoryless 
systems (e.g., fT]) are popular characterizations. However, 
except for the purely Gaussian case, energy (or second- 
order statistics) and information show an inherently different 
behavior It is therefore desirable to extend current system 
theory by information-theoretic aspects. 

While the data processing inequality (e.g., [3 p. 35]) cap- 
tures the fact that deterministic functions of random variables 
(RVs) destroy information, relatively little has been done to 
quantify this information loss. Pinsker showed that the entropy 
rate of a function of a stationary stochastic process on a 
finite alphabet is bounded from above by the entropy rate of 
the original process JH Ch. 6.3]. Similarity, Watanabe and 
Abraham analyzed the rate of information loss for functions 
of stationary stochastic processes, introducing also a relative 
version of information loss in 15]. Results on the information 
loss rate in dynamical systems, together with an upper bound, 
were presented in 16|. 

While these works focus on finite or countable alphabets, [7] 
analyzes the absolute and, in case the latter is infinite, relative 
information loss induced by passing a real-valued RV through 
a memoryless system. In this work we extend JT] to real- 



valued, stationary stochastic processes. In particular we show 
that the information loss for RVs distributed according to 
the marginal distribution of the process is an upper bound 
on the information loss rate (Section HIH i. A similar result is 
shown also for the relative information loss rate, although 
there the bound is tight in many more cases (Section |V}: 
While redundancy helps to reduce the rate of information 
loss, it often fails to reduce the rate of relative informtion 
loss. The connection between the rate of information loss and 
the differential entropy rates of the input and output processes 
shown in Section|III]is remarkably similar to the corresponding 
result for information loss presented in ||7]. 

In search for processes which are simple to analyze, we 
found a set of sufficient conditions such that for a Marko- 
vian input process also the output process has the Markov 
property (Section llVt . This extends the notion of lumpability 
(cf. 1 8 1) from discrete-time and continuous-time, homogeneous 
Markov chains to discrete-time, homogeneous, real-valued 
Markov processes. These conditions, together with our other 
theoretical findings, are illustrated with the help of examples 
in Section |VT] 

II. Preliminaries & Notation 

Throughout this work we consider discrete-time, stationary 
stochastic processes X with alphabet ^ C R. Let X„ be the 
ri-th sample of the process, and let X^ — {Xi, X^+i, . . . , Xj}. 
By stationarity, the distribution of X„, Px„, equals the 
marginal distribution Px- We assume that Px is absolutely 
continuous w.rt. the Lebesgue measure, and that it thus 
possesses a probability density function (PDF) fx- Similarity, 
we assume that for all n, the joint PDF /x" and the conditional 
PDP fx„\x['-^ exist. 

Let H{-), h{-), H{-), and h {■) denote the entropy, the differ- 
ential entropy, the entropy rate, and the differential entropy rate 
of the RVs and stochastic processes in the argument (see 13] 
or [9] for definitions). We assume that the joint differential 
entropy of an arbitrary collection of RVs exists and is finite, 
and that also the entropy rate li9j Thm. 14.7] 

h{X):= lim /i(X„|Xf-i) = lim -h{X^) (1) 

exists and is finite. The logarithm for the entropies is taken to 
the base 2. 



III. Information Loss Rate Piecewise Bijective 
Functions 

In this section we devote our attention to a specific class of 
functions for which the preimage of every point of its range 
is an at most countable set: 

Definition 1 (Piecewise Bijective Function). A piecewise 
bijective function g: X -^ y, X,y C M^, is a surjective, 
measurable function defined piecewise on an at most countable 
partition {Xi} of its domain: 



51 (x), ifxeA"! 
g(x)^{92{x), iixeX2 



(2) 



where each g^: Xi — > J^j is bijective. Furthermore, the 
derivative g' exists on the closures of Xi, and its magnitude 
is non-zero Px-a.s. 

Feeding the stationary stochastic process X through a 
memoryless system described by such a function g gives 
rise to another stationary stochastic process Y defined by 
Yn := g{Xn), which, intuitively, conveys less information. 
In order to analyze the amount of information lost per sample 
we introduce 

Definition 2 (Information Loss Rate). The information loss 
rate is 

I(X ^ Y) := lim -L{X^ -^ ¥{") = lim -HiX^lY;") 

(3) 
i.e., the average of the block information loss. 

We showed in [7| that the information loss in systems 
described by functions satisfying Definition[T]can be computed 
as 

L{X ^ y) = H{X\Y) = h{X) - h{Y) + E {log \g'{X)\} 

(4) 
where Y — g{X) and where the expectation is taken w.rt. X. 
We now present a corresponding result for stationary stochastic 
processes: 

Proposition 1 (Information Loss Rate for PBFs). The infor- 
mation loss rate induced by feeding a stationary stochastic 
process X through a PBF g is 

L(X -^Y)=h{X)-h{Y)+E {log \g'{X)\} . (5) 

Proof: For the proof we note that the n RVs X" can 
be interpreted as a single, ri-dimensional RV; similarily, we 
can define an extended function g": A"" -^ y", applying 
g coordinate-wise. The Jacobian matrix of g" is a diagonal 
matrix constituted of the elements g'{xi). With the extension 
of ^ to multivariate functions we thus obtain Q 



L(Xr ^ Fi") = h{X^) - h{Yl') + E <^ log 



\{9'{X, 



where the first line is because the determinant of a diagonal 
matrix is the product of its diagonal elements, and where we 
employed stationarity of X to obtain the second line. Dividing 
by n and taking the limit completes the proof. ■ 

In Q we showed that the information loss of a cascade of 
systems equals the sum of the information losses induced in 
the systems constituting the cascade. Indeed, this result can be 
carried over to the information loss rate as well: 

Proposition 2 (Cascade of Systems). Let X be fed through 
a PBF g to obtain Y, and let Y be fed through a PBF h to 
obtain Z. The information loss rate of the cascade is given as 
the sum of the individual information loss rates: 



L(X ^ Z) = L(X -^ Y) +i(Y 



(7) 



Proof: The proof follows from the fact that the cascade 
is described by the function hog, and that 

E{\og\ihogyiX)\}^E{\og\g'iX)h'{g{Xm 

= E{log|5'(X)|} + E{log|/i'(y)|}. (8) 



It is often not possible to obtain closed-form expressions 
for the information loss rate induced by a system. Moreover, 
estimating the information loss rate by simulations soon suf- 
fers the curse of dimensionality, as, in principle, infinitely 
long random sequences have to be drawn and averaged. Much 
simpler is an estimation of the information loss, since a single 
realized, sufficiently long sequence allows for an estimation 
of the latter As the next proposition shows, this relatively 
simple estimation delivers an upper bound on the information 
loss rate: 

Proposition 3 (Loss > Loss Rate). Let "K be a stationary 
stochastic process and X an RV distributed according to the 
process's marginal distribution. The information loss induced 
by feeding X through a PBF g is an upper bound on the 
information loss rate induced by passing X through g, i.e.. 



L{X^Y) <L{X ^Y). 



(9) 



Proof: The inequality holds trivially if L{X -^ Y) = oo. 
The rest of the proof follows from the chain rule and the fact 
that conditioning reduces entropy: 



L(X ^ Y) = lim -HiX^lY^") 



lim — 



E 



1 

< lim — y 

n— >-oo 71 ^- — ^ 

^L{X^Y). 



H{X,\Xl-\Yn 

H{X,\Y,) 



(10) 

(11) 

(12) 
(13) 



h{Xl^) - h{Yn + nE{log \g'{X)\} (6) 



Clearly, this bound is tight whenever the input process X 
is an iid process. Moreover, it is trivially tight whenever 
the function is bijective, i.e., when L{X — > F) = 0. In 



Section [VI-CI we present an example which renders this bound 
tight in the general case. 

Intuitively, this bound suggests that redundancy of a process, 
i.e., the statistical dependence of its samples, reduces the 
amount of information lost per sample when fed through a 
deterministic system. The same connection between informa- 
tion loss and information loss rate has already been observed 
in ISl for stationary stochastic processes with finite alphabets. 

The next bound again extends a result from |7|, bounding 
the information loss rate by the entropy rate of a stationary 
stochastic process on an at most countable alphabet. As such, 
it presents a different way to estimate the information loss rate 
efficiently using numerical simulations. 

Proposition 4 (Upper Bound). Let W be a stationary stochas- 
tic process defined by Wn '■= i if Xn G Xi. Then, 



L{X -^ Y) =H{W\Y) < H{W). 



(14) 



Proof: We again treat X" as an n-dimensional RV; 5" 
induces a partition of its domain X", which is equivalent to 
the n-fold product of the partition {Xi}. Letting W be the 
RV obtained by quantizing X" according to this partition, it 
is easy to see that VF" is equivalent to W. Thus, with Q, 

H{Xl'\Y{') = i7(P^|Yi") = i?(P^i"|Yi") (15) 

for all n. This, together with the fact that conditioning reduces 
entropy, completes the proof. ■ 

For the case that the input process is a Markov process, 
i.e., if f^ \x"^^ ~ fxn\x„^i for all n, an additional, sharper, 
upper bound can be presented: 

Proposition 5 (Upper Bound for Markovian X). Let li. be a 
Markov process, and let W be as in Proposition |4] Then, for 
finite L{X ^ Y), 



L{X^Y) <H{W2\Xi). 



(16) 



Proof: We again apply the chain rule, Markovity of X, 
and the fact that conditioning reduces entropy to arrive at 



i?(^rin") <J2HiX^\X,-i,Yi). 



By stationarity we obtain 



L(X^Y) < HiX2\Xi,Y2) 

^^ H{W2\X,,Y2) 
< H{W2\Xi) 



(17) 



(18) 

(19) 
(20) 



where (a) holds since, for all x ^ X, H{X2\Y2,Xi = 
x) = H{W2\Y2,Xi = x) (It). The last inequality is due 
conditioning [3] Thm. 2.6.5] and completes the proof. ■ 

That the bound is sharper than the one of Proposition |4] 
follows from observing that 



H{Wn\Xn^i) = lim H{Wn\X^-') 

n— >-ck: 

< lim H{Wn\W"-^) 



The interpretation of this result is that a function destroys 
little information if the process is such that, given the current 
sample Xn-i, the next sample X„ falls within some element 
of the partition with a high probability. The question whether, 
and under which conditions, this bound is tight is related to 
the phenomenon of lumpability and will be answered in the 
following section. 

IV. Lumpability for Continuous-Valued Markov 
Processes 

It is well-known that the function of a Markov process 
need not possess the Markov property itself. However, as it 
is known for Markov chains, there exist conditions on the 
function and/or the chain such that the output is Markov. 
In |8| this has been termed lumpability and subsequently 
investigated by numerous researchers. While most results are 
given for finite Markov chains (e.g., ifTOl . ifTTI ) relatively 
little is known in the general case of an uncountable alphabet 
(see |12| for an exception). Our small contribution to this 
field of research lies in presenting sufficient conditions for 
lumpability of continuous-valued Markov processes. 

Let /^ |^„-i — /x„|x„_i for all n, i.e., let X be aMarkov 
process. We maintain 

Proposition 6. If 

Vy2e3;2:Vxe.g-i[yi]: 

fY2,xAy2,x) > ^ fY2\xAy2\x) = fY2\YAy2\yi) (22) 

then X is lumpable w.r.t. g, i.e., Y is Markov. 

Proof: See Appendix. ■ 

As a corollary, we next make the conditions on the function 
g, the marginal distribution fx, and the conditional distribu- 
tion fx2\Xi explicit. By adding a further condition, we gain 
tightness of Proposition |5] in addition to Markovity: 

Corollary 1. If for all yf G y^ and all x,x' £ g^^[2/i] such 
that fx{x) > and fx{x') > the following holds 

fx2\X^{x2\x) _ .^ fx2\X^ix2\x') 



E 

X2eg My2 



\9'ix2)\ 



E 

X2eg ^2/2 



\9'ix2)\ 



(23) 



then the condition of Proposition \6\ is fulfilled and Y is 
Markov. 

If additionally, for all y Cz y, all x within the support of 
fx, and all w,w' such that Pr(W2 — w\Xi = x) > and 
Fi{W2 ^w'\Xi =x)>0 

fx2\xA9wHy2)\x) _ fx2\xA9w'iy2)\x) 



\9'{9^Ay2))\ 



Wi9~Hy2))\ 



(24a) 



and 



HiW). (21) 



Pr(VK2 = w'\Xi = x) = Pt{W2 = w\Xi = x) (24b) 

then the bound of Proposition \5\ holds with equality. 

Proof: See Appendix. ■ 

In Section |Vl] we show some examples for which the output 
process Y is Markov and for which the conditions in (l24l l are 
fulfilled. 



V. Relative Information Loss Rate for Functions 
WHICH Reduce Dimensionality 

Not all systems can be described by functions satisfying 
Definition [1] In particular, a simple quantizer already violates 
this definition and suffers from infinite information loss. To 
analyze the information processing characteristics of a broader 
class of systems, in [7] the notion of relative information 
loss was introduced, capturing the percentage of information 
available at the input lost in the system. To extend this notion 
to stochastic processes, we introduce 

Definition 3 (Relative Information Loss Rate). The relative 
information loss rate is 



1{X -^ Y) := lim 1{XI' -^ Y{') = lim lim ^ ^' ^ 



(25) 



whenever the Umit exists. 



The Umit X ^ X is equivalent to limfc^ooL2''XJ/2'', 
where flooring and scalar multiplication are applied element- 
wise (cf. ill). 

Based on 



n ^-^ 



(a) 



1{X -^ Y) (26) 



from Q and from stationarity of X, which yields (a), one 
can sho\43 that /(X — > Y) < 1{X -^ Y), complementing 
Proposition [3] However, in many cases this inequality is an 
equality, as we show in 

Proposition 7 (Redundancy won't help). Let X be a station- 
ary stochastic process and X an RV distributed according to 
the process 's marginal distribution. Let further g be defined 
on a finite partition {Xi\ of X into non-empty sets as in (|2]l, 
where gi G C°° is either injective or constant (i.e., gi{x) = Ci 
for all X 6 Xi). Then, 



l{y.^Y) = l{X^Y)^Px{X,) 



(27) 



where X^ is the union of all elements Xi of the partition on 
which g is constant. 

Proof: See Appendix. ■ 

Indeed, we conjecture that equality is indeed the "usual" 
case, prevailing in most practical scenarios. Thus, while re- 
dundancy can help reduce information loss, it may be useless 
when it comes to relative information loss. Applications of this 
result may be the scalar quantization of a stochastic process 
(leading to a relative information loss rate of 1, i.e., 100% of 
the information is lost [7J) and system blocks for multirate 
signal processing (see the example in Section IVI-Dl l. 




* Y 



Fig. 1. AR( 1 )-process with magnitude function. The input Z is a sequence 
of iid Gaussian RVs with zero mean and variance cr^; thus, the process X 
is Gaussian with zero mean and variance a^ /I — a? . The process generator 
filter is a first-order all-pole filter with a single pole at a. 



VI. Examples 

A. AR-Process and Magnitude Function 

In this example we assume that a first-order, zero-mean, 
Gaussian auto-regressive process X is fed through a magnitude 
function (see Fig. [TJ. Let the AR process be generated by the 
following difference equation: 



Xn — aXn^i + Zn 



(28) 



where a G (0,1) and where Z„ are samples drawn inde- 
pendently from a Gaussian distribution with zero mean and 
variance a'^. It follows immediately that the process X is also 
zero mean and has variance a\ = jzr^ lU Ex. 6.11]. Let Y 
be defined by Yn = |X„|. 

For the sake of brevity we define 0(/^, cr^; x) as the PDF of 
a Gaussian RV with mean ^ and variance a^, evaluated at x. 
Thus, we get 



fxix) = (l){0,ax;x) 



and 



fx2\xAx2\xi) = (j){axi,a^;x2). 
It follows that ( l23b is satisfied with \g'{x)\ e 



(29) 

(30) 
1 and since 



(axi,a';x2J 

E 

2:263 Ma2] 



-axi, a 



'X2), 



.fx2\xA^2\yi) 
\g'{^2)\ 

= 0(02/1, cr^; 2/2) + (/>(ayi,cr^;-2/2) 
= <P{-ayi,a^; -^2) + (/-(-ayi, cr^; 2/2) 

v;^ fx2\x^ix2\~yl) 

X2&g '-[y2\ 

As a consequence, the output process Y is Markov. 

We performed a series of simulations, as the information 
loss rate for this example cannot be expressed in closed form. 
Rewriting, e.g., the lower bound on the information loss rate 

as 

L(X -^ Y) 

> h{X2\X,) - hiY2\X,) + E {log \g'iX)\} (32) 
= h{X)- I{Xi ■,X2)-hiY) + I{Xi ; Y2) 

+ E{log|5'(X)|} (33) 

^LiX^Y)~IiXi;X2) + I{Xi;Y2) (34) 

' Note that also Watanabe and Abraham defined the fractional information 
I0S.S for stochastic processes on finite alphabets |5|; for these types of 
processes, however, the relative information loss can be smaller or larger 
than the information loss. 
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Fig. 2. Information Loss Rate of an AR(l)-process X in a magnitude function 
as a function of the pole a of the process generator difference equation. A 
larger pole, leading to a higher redundancy of X, reduces the information 
loss rate. 



allowed us to employ the histogram-based mutual information 
estimation from [13] together with L{X ^ Y) ^ 1, as shown 
in Q. The upper bound H{W2\Xi) from Proposition |5] was 
computed using numerical integration. In Fig. |2]one can see 
that the first-order upper and lower bounds on the information 
loss rate from Lemma [T] in the proof of Proposition |5] are 
indistinguishable, which suggests that the output process is 
indeed Markov. Moreover, it can be seen that a higher value for 
the magnitude a of the pole leads to a smaller information loss 
rate. This can be explained by the fact that the redundanc}|3 
of the process X increases with increasing a, which helps 
preventing information loss. 

Generally, while redundancy reduces the information loss 
rate compared to an iid process (cf. Proposition |3), it is 
not necessarily true that more redundancy leads to a smaller 
information loss rate than less redundancy. Indeed, one can 
generate examples where a process with a higher redundancy 
suffers from a higher information loss rate than a process 
with less redundancy. This suggests that the redundancy of 
a process has to be matched to the function g in order to 
efficiently prevent information from being lost; in that sense, 
this parallels the field of channel coding, where the code needs 
to be matched to the characteristics of the channel (noise, 
fading, burst errors) in order to successfully reduce the bit 
error rate. 

B. Cyclic Random Walk 

We next consider a scenario where our process is a cyclic 
random walk on a subset [— M, M] of the real line. Assume 
that for a given state Xi the following state is uniformly 
distributed on a cyclically shifted subset of [— Af, M] of length 
2a < 2M, i.e.. 



where d{x,y) — min^ \x — y — 2kM\. Intuitively, X„ is the 
sum of n independent RVs uniformly distributed on [—a, a], 
where sums outside of [—M, Af] are mapped back into this 
interval via the modulo operation. It is easy to verify that 
the marginal distribution of X is the uniform distributioio 
i.e., fx{x) = 217 for all x € [—M,M] and zero otherwise. 
The function we feed the process through shall again be the 
magnitude function, i.e., Yn — \Xn\. 

Since d{x,y) — d{—x,—y) and since |(?'(a;)| = 1 for all 
X, it follows that ( |23]) is fulfilled, and that thus Y is Markov. 
Moreover, we have h (Y) = h{Y2\Xi), and obtain for the 
information loss rate with Proposition \T} 

L(X ^ Y) 

= /^(X)-/^(Y)+E{log|5'(X)|} (36) 

= hiX2\X,) - hiY2\X,) (37) 

Jx{Xl)jM{X2\Xl)\0g ^ ,__,__ ^ dX2dxi 



-M J-M 
M nM 



Im ix2\xi) 



(38) 



fM{x2\xi) ( fM{-X2\xi)\ 

^17 log 1 + -p-7 — . — ^ dx2dxi. 

(39) 



The logarithm evaluates to zero if fM{—X2\xi) = and to 
one otherwise (the logarithm is taken to base 2). Therefore, 
we can write 



L(X^ Y) 
_Aa_ f^' 
" 2Af X 



M 



lM{x2\xi)fM{-X2\xi)dx2dxi 



(40) 



M 



where we exploited the symmetry of Jm- It can be shown 
that the integral evaluates to i, so the information loss rate is 
L(X^Y) = ^. 

This result has a nice geometric interpretation: It quantifies 
the expected overlap of two segments of length 2a randomly 
placed on a circle with circumference 2A/; due to the modulo 
operation the point —A/ is equivalent to the point M, and the 
conditional PDFs fM{x2\xi) and fM{—X2\xi) represent the 
segments (see Fig. [3]). 

Finally, we evaluated the upper bound from Proposition |5] 
Letting Xi = [-Af, 0) and X2 = [0,Af] and abbreviating 
p{l\x) := Pr(VF2 = l|^i = x) we obtained 



i-M-a 



fMix2\xi) := /jf2|Xi(a^2ki) 



(41) 



^The redundancy is defined as the difference between the entropy of the 
mai'ginal distribution and the entropy rate of the process. The former increases 
due to increasing variance cr^ , while the latter remains constant and equal to 



HZ) (cf. Ql). 



-Af < a; < -Af + a 
— Af + a < X < —a 
—a < X < a 
a < X < M ~ a 
M -a< X < M 



'The discrete-valued equivalent is a Markov chain with a doubly stochastic 
transition matrix, for which it is known that the stationary distribution is the 
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(35) 
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I 2a 



uniform distribution J9] p. 732]. 




XI = 

Fig. 3. Interpreting the information loss rate of a cyclic random walk in a 
magnitude function. The depicted scenario coiTesponds to a = M/3. 
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Fig. 4. Information loss rate of a cyclic random walk X on \—M,M\ in a 
magnitude function as a function of the support [—a, a] of the uniform input 
PDF 



if M > 2a and 



' a — M—x 
la ' 
2a-M 
2a ' 



p(i|^) = s^- 



M 
2a' 

g+Af-a: 
2a ' 



-M <X < -a 

-a< X < -M + a 

-M + a<x < M -a (42) 

M — a < X < a 

a< X <M 



if M < 2a. (Naturally, p{2\x) = 1 - p{l\x).) Computing 
the entropy H{W2\Xi = x) based on these probabilities and 
taking the expectation w.r.t. Xi yields 



HiW2\Xi) = 



_ J Mhi2' 
Af-g _ 
M hi 2 



log 



2a 

Af ' 



M > 2a 
M < 2a 



(43) 



The analytic result for the information loss rate and the 
bound, numerically validated using the same procedure as in 
Section IVI-AI are depicted in Fig. |4] 

C. An Example illustrating the Tightness of the Bounds 

The following example illustrates the tightness of the pre- 
sented bounds and also satisfies the condition of lumpability. 
Assume that X is a Markov process with conditional distri- 
bution 



fM{x2\xi 



1 I I[i,2)(a:2) + I[3.4)(a:2), xi G [0, 1) U [2, 3) 

2 \l[o.i)(a^2) + I[2,3)(a;2), XI e [1,2) U [3,4) 

(44) 



where 1^(2:) — 1 iff x E A. As it can be shown easily, 
the stationary distribution is the uniform distribution on [0,4), 
thus, h{X) = log4 = 2 and h (X) = 1. 

We analyze the system mapping the interval X2 = [2,4) 
onto the interval Xi ~ [0, 1), i.e.. 



9{x) = 



X, 



xe [0,2) 
2, a;e[2,4) 



which yields the conditional distribution of I2 given Xi 



fY2\Xi{y2\xi) 



II[i,2)(2/2), xiG[0,l)U[2,3) 
II[o,i)(2/2), xie[l,2)U[3,4) 



(45) 



(46) 



The derivative of this function is identical to one, the stationary 
distribution of the output process Y is the uniform distribution 
on [0,2); thus, h{Y) = log2 = 1 and L{X -■ Y) = I. 

The output process Y can be shown to be Markov: As- 
suming Xi = X G [0,1), it follows that x' G [2,3); since 
these conditions are equivalent in the definition of /a/, (|23T i 
is fulfilled. 

From /m one can see that Pr(W2 = M-^i — x) = 
Pi{W2 — 2|Xi — x) = ■^ regardless of x, which satis- 
fies (I24bl i and renders the upper bound from Proposition |5] 
as 

H{W2\Xi) = l. (47) 

The bound can be shown to be tight, since also ( I24ab is 
fulfilled: Given, e.g., Xi = x e [0, 1) and Y2 = y e [1, 2), it 
follows that X2 G {y, y+2} and fAi{y\x) = /a/ (y+2|.T) = i. 
Thus, we are led to the following conclusion: 

1^ L{X ^Y)>L{X^Y) = H{W2\Xi) ^1 (48) 

This is an example not only for tightness of Proposition |5] 
but also of Proposition [3] Interestingly, neither is the function 
information-preserving, nor is the input process X iid. Conse- 
quently, one can interpret this example as a worst-case, where 
redundancy is not matched to the system (the "channel"), 
failing to alleviate the adverse effects of the system. 

D. Multirate Systems 

Although strictly speaking not time-invariant, also multirate 
systems can be analyzed with the proposed relative informa- 
tion loss rates. In particular, we will show that for an i\/-fold 
downsampler, which is described by the input-output relation 
Yn = XnM, the information loss rate equals 

,, M -1 



Z(X 



M 



(49) 



To this end, note that the stationary output process Y is 
equivalent to the cyclo-stationary process Y, whose samples 
are defined as 



K, = 



X„, if n/M G Z 
0, else 



(50) 



In essence, the function in ( fSOb implements a projection on a 
subspace of lower dimensionality. For these type we showed 
in |7| that the relative information loss is related to the 



information dimension of the output, which in our case is given 
by the number of its non-zero entries, i.e., by 

n I 



divn 



(51) 



With d{X^) = n and by the fact that [n/AfJ = n/M 
{n/M}, where {■} denotes the fractional part, we obtain 



lim 1{X^ ^ Yi") = 1 - lim 



n/M + {n/M} 



^1-^^'-^. (52) 
M M ^ ' 

The second equality follows because the magnitude of the 
fractional part is bounded by unity and that, thus, this term 
vanishes in the limit. 

VII. Conclusion 

In this work we extended previous results about the in- 
formation loss induced by deterministic, memoryless input- 
output systems from random variables to stationary stochastic 
processes with continuous distribution. Notably, we showed 
a connection between the rate of information loss and the 
differential entropy rates of the input and output processes 
for a special class of functions. While redundancy decreases 
the information loss rate for this class of systems, systems 
which destroy an infinite amount of information do not benefit 
from redundancy of the process in most practical cases. 
Future investigations shall focus on the extension to systems 
with memory and on the problem of reconstructing the input 
process. 

As side results, we presented sufficient conditions for the 
Markovian input process and the system function such that the 
output process is Markov. 

Appendix A 
Proofs 

A. Proof of Proposition \S\ 

Note that a possible definition of Markovity is given by 

Definition 4 (Markov Process |[T5] II.6, p. 80]). A process X 
is a Markov process iff for all i £ N, a G M, and integers 
ni < n2 < ■ ■ ■ < Ui < n, with probability one, 

Pr{Xn < a\Xn^ = Xn^,. . . , X„, = .T„, ) 

^Pr{Xn<a\Xn^=Xn,). (53) 
Clearly, a process is Markov if, for all n, 

fx„\xr' =' f^^Ax,.-, (54) 



holds P^,i-i-a.s. because (l53l l results from integrating the 
densities over (—00,0]. 

The proof of the proposition follows along the same lines as 
the proof for Markov chains given in [T61, and is built on the 
following Lemma, which is an extension of [3, Thm. 4.5.1]: 

Lemma 1 (Bounds on the differential entropy rate). Let X 
be a stationary Markov process with differential entropy rate 



h (X) — h(X2\Xi) and let "Y be a stationary process derived 
from "X. by Yn — g{Xn). Then, 

h{Yr,\Y^^-\X^) <h{Y)< h{Yn\Yr^). (55) 

Proof: The upper bound follows from the fact that con- 
ditioning reduces entropy, so we only have to show the lower 
bound. For this, note that by Markovity of X, 

h{Y,,\Y,^-\X,) = h{Y„\Y,"-\xl) (56) 

for all k < 1. Let Uk = (K^"^^fc) and T4 = Y//-\ 
Obviously, there exists a function / such that Vk = f{Uk), 
namely the function which is the identity function on the first 
71—2, and the function g on the last 2 — k elements. By showing 
that 

KY,,\Uk) < h{Y^\Vk) (57) 

the lower bound is proved by |l9J Thm. 14.7] 
h{Y,,\Y^-\Xi)^ lim /i(r„|C/fe) 

k—^ — oo 

< lim h{Yn\Vk)=h(Y). (58) 
fc— ^— 00 

Thus, we write 

h{Yn\Vk) - hiY„\Uk) 
= h{Yn, Vk) - h{Vk) - h{Yn, Uk) + h{Uk) (59) 

^"' H{Uk\Vk) - E{log \AeiJf{Uk)\} 

-H(;7fe,r„|X4-,y„) + E{log|detJ/(C/fe)|} (60) 

= H{Uk\Vk) - H{Uk\Vk, r„) (61) 

> (62) 

where (a) is due the multivariate extension of (|4|l and since the 
determinant of the Jacobian matrix is the same for the function 
/, and for a function which applies / to some, and the identity 
function to the rest of the elements. This completes the proof. 

■ 
We now turn to the 

Proof of Proposition^ Note that the assumption implies 
that 

/y2,Xi y,a; log . I . .. dydx 

= h{Y2\Yi) - h{Y2\Xi) = Q (63) 

which renders the upper bounds of Lemma [T] equal for n = 2. 
Thus, h{Y) = /i(r„|Yi"~^) = h{Y2\Yi) for all n. By 
stationarity, 

o = /i(y„|y„_i)-/i(y„|ri"-i) (64) 

^ I{Y,,;Y^-^\Y,,^^) (65) 




Jy,^,y{'-^\y„-i\^i) 



fY„\Y^-AY;^^i)fYr-'iY^^.(^r\ 

/yjy"-(n") M 



(66) 



(67) 



E 



fY.\Y„-AY,?^i) J j 

{Dify^^y.-.{;Yr')\\fY„\Y„^A;Yn-l))} (68) 



where 1?(-||-) is the Kullback-Leibler divergence and where 
in the last line the expectation is taken w.r.t. Y"~^. 

The expectation of a non-negative RV, such as the Kullback- 
Leibler divergence above, can only be zero if this RV is almost 
surely zero. Together with the fact that the Kullback-Leibler 
divergence between two PDFs vanishes iff the PDFs are equal 
almost everywhere, the assumption of the proposition implies 
that 

fY„\Y^-^ "=' /v;.!!'^-! (69) 

Pyn-i-a.s. But this implies Markovity by Definition |4] 
(cf. (|54] |) and completes the proof. ■ 

B. Proof of Corollary Q] 

Note that ^ implies fY2\Xi {y2\x) = /yjIXi (2/2 la:') for all 
x^x' within the support of fx- Now 



where p{w\x) — Pr(W2 — w\Xi — x). Let, for a given x, 
w satisfy p{w\x) > 0. The proof is completed by recognizing 
that 



fY2\X^{y\x) 
^^p{w\x)fY2\W2,xAy\'^^^^) 



fx2\xA9wHy)\x) 



? \9'i9^\y))\ 
_ fx2\x^{9:a^{y)\x) 



(79) 
(80) 
(81) 



\9'i9^\y))\ 

(b) /x2|Xife^(y)|a;) 1 



Y^[p{w\x) > 0] 

csird{{w: p{w\x) > 0}) (82) 



fY2\YAy2\yi) = 



hiyi) 



E 



fY2\Xt{y2\xi)fx{xi) 



(83) 
(84) 



xieg Mai] 



\9'{x,)\ 



(70) 

Let c/+^[yi] := {a; G g^\yi] ■ fx{x) > 0} and let x be an 
arbitrary element of this set. We proceed 



fY2\Y^iy2\yl) 

1 



fYiyi) 



y- fY2\xAy2\xi)fx{Xl) 



2:163+ [yi] 

(a) fY2\xAy2\x) 



\9'ixi)\ 
fxjxi) 

fYiyi) ^^ \9'{xi)\ 

fY2\xAy2\x) 



E 



(72) 



(73) 



where (a) is due to (|23] |. Since fY2,Xi — fY2\Xifx we can 
apply Proposition |6] to complete the first part of the proof. 
For the second part, note that we have with Proposition |6] 



h (Y) = h{Y2\Xi) 
and thus, with Proposition [T] and dU, 



(74) 



L(X ^ Y) = hiX2\Xi) - h{Y2\Xi) + E {log \g'iX)\} 

= H{X2\Xi,Y2). (75) 

It remains to show that ( |24] | implies equality in (|20] | in the 
proof of Proposition |5] To this end, observe that 



\9'i9^u^iy))\ piMx) 

= fY2\W2.Xriy\w,x) 

where (a) is due to (I24ab and (b) is due to (I24bl i. ■ 

C. Proof of Proposition \7\ 

We start with showing 1{X -> F) = Px{Xc). To this 
end, letting d{Z) denoting the Renyi information dimension 
of Z [VZi and employing |[l8l, |[T9l, we write 

K 

d{X\Y = y) =Y,d{X\Y ^y.Xe Xi)Px\Y=y{Xi) (85) 

where K — caid{{Xi]). W.l.o.g., the partition is indexed such 
that the first L elements correspond to subsets Xi on which g 
is constant. Thus, Xc — Ui=i '^i- It follows for i > L, from 
the bijectivity of g^, that d{X\Y = y,X e Xi) = 0. Moreover, 
if for i < L we have X G Xi, it follows that Y — Ci, and that 
thus 

d{X\Y = y) 

L 

= ^d(x|r = j/,xe^,)^x|i'=y('*'.) (86) 

1=1 

L 

= Y, d{X\X G X,)PxiY=y{X,) (87) 

;=i 

L 

/ ^Px\Y=y{^i) 



(a) 



(88) 



j=i 



— PxlY^yi'^c)- 



(89) 



-ff(M^2|^l) - -ff(M^2|^l,>"2) - /(W^2;>"2|^i; 



(76) 



vanishes if for all y G 3^ and all a; G A" such that fx (x) > 
and for all w such that Pr(W2 = w\Xi — x) > 



fY2\xAy\^) = fY2\w2,xdy\W'^)- 



(77) 



But 



r /IN fx2\xA9w {y)\x) ,„„, 

fY2\W2,xdy\w,x)^ _ (78) 



where (a) is due to the fact that the RV X restricted to the 
non-empty set Xi possesses a density and (6) follows from 
the fact that the partition consists only disjoint sets. We now 
combine d{X) = 1 and 

d{X\Y) = / d(X\Y = y)dPY{y) - Px{X,) (91) 

Jy 

with the fact that 1{X ^ Y) = ^^^ IJT] and obtain the first 



part of the proof. 



d{X) 






-Pxi^iX, X Xj' ') + -Px^iXc X A-e X A-," ') 
n ^ n ^ 



n ^ 



(90) 



-Px^ixJ' 'xX,) 



Now take a finite sequence X" obtained from the stochastic 
process X and look at the relative information loss incurred 
in g. Similarly as in the proof of Proposition |4] g" induces a 
finite partition of A"". Moreover, for every element of this 
partition, g" is a composition of a bijective, differentiable 
function and, possibly, a projection. We can thus apply the 
result about dimensionality reduction presented in [[71 which 
leads to ( |90l l where Xc = X \ Xc- Compactly written, we get 

1 " 
1{X[' -^ Y{') ^-J2 iPr(card({Xj G X[' : X^ G X^}) = i). 

(92) 
Defining 

K.:4;; ";*-^* (93) 

10, else 

and Zn :—J2^=i^j^ ^"d with the linearity of expectation we 
get 

1{X^ ^ Fi") 



^ n n 

1 " 

- y zPr(Z„ = i) 

i=l 

-E{Z„} 
n 

1 " 



(94) 

(95) 
(96) 
(97) 



j=i 



(a) 



E{^} (98) 

= PxiXc) (99) 

where (a) is due to stationarity of X. This completes the proof. 

■ 
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